A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data
نویسندگان
چکیده
MOTIVATION False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. RESULTS Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method.
منابع مشابه
The False Discovery Rate in Simultaneous Fisher and Adjusted Permutation Hypothesis Testing on Microarray Data
Background and Objectives: In recent years, new technologies have led to produce a large amount of data and in the field of biology, microarray technology has also dramatically developed. Meanwhile, the Fisher test is used to compare the control group with two or more experimental groups and also to detect the differentially expressed genes. In this study, the false discovery rate was investiga...
متن کاملDetecting Differentially Expressed Genes Using Calibrated Bayes Factors
A common interest in microarray data analysis is to identify genes having changes in expression values between different biological conditions. The existing methods include using two-sample t-statistics, modified t-statistics (SAM), Bayesian t-statistics (Cyber-T), semiparametric hierarchical Bayesian models, and nonparametric permutation tests. All these methods essentially compare two populat...
متن کاملEstimating p-values in small microarray experiments
MOTIVATION Microarray data typically have small numbers of observations per gene, which can result in low power for statistical tests. Test statistics that borrow information from data across all of the genes can improve power, but these statistics have non-standard distributions, and their significance must be assessed using permutation analysis. When sample sizes are small, the number of dist...
متن کاملComments on the analysis of unbalanced microarray data
MOTIVATION Permutation testing is very popular for analyzing microarray data to identify differentially expressed (DE) genes; estimating false discovery rates (FDRs) is a very popular way to address the inherent multiple testing problem. However, combining these approaches may be problematic when sample sizes are unequal. RESULTS With unbalanced data, permutation tests may not be suitable bec...
متن کاملSome Comments on Instability of False Discovery Rate Estimation
Some extended false discovery rate (FDR) controlling multiple testing procedures rely heavily on empirical estimates of the FDR constructed from gene expression data. Such estimates are also used as performance indicators when comparing different methods for microarray data analysis. The present communication shows that the variance of the proposed estimators may be intolerably high, the correl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 21 23 شماره
صفحات -
تاریخ انتشار 2005